Importar GraphLab


In [5]:
import graphlab

Cargar nuestro dataset


In [6]:
sales = graphlab.SFrame('home_data.gl/')


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: C:\Users\Matias\AppData\Local\Temp\graphlab_server_1507317030.log.0
This non-commercial license of GraphLab Create for academic use is assigned to juanmatiasrivera@outlook.com and will expire on September 25, 2018.

In [7]:
sales


Out[7]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
7129300520 2014-10-13 00:00:00+00:00 221900 3 1 1180 5650 1 0
6414100192 2014-12-09 00:00:00+00:00 538000 3 2.25 2570 7242 2 0
5631500400 2015-02-25 00:00:00+00:00 180000 2 1 770 10000 1 0
2487200875 2014-12-09 00:00:00+00:00 604000 4 3 1960 5000 1 0
1954400510 2015-02-18 00:00:00+00:00 510000 3 2 1680 8080 1 0
7237550310 2014-05-12 00:00:00+00:00 1225000 4 4.5 5420 101930 1 0
1321400060 2014-06-27 00:00:00+00:00 257500 3 2.25 1715 6819 2 0
2008000270 2015-01-15 00:00:00+00:00 291850 3 1.5 1060 9711 1 0
2414600126 2015-04-15 00:00:00+00:00 229500 3 1 1780 7470 1 0
3793500160 2015-03-12 00:00:00+00:00 323000 3 2.5 1890 6560 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat
0 3 7 1180 0 1955 0 98178 47.51123398
0 3 7 2170 400 1951 1991 98125 47.72102274
0 3 6 770 0 1933 0 98028 47.73792661
0 5 7 1050 910 1965 0 98136 47.52082
0 3 8 1680 0 1987 0 98074 47.61681228
0 3 11 3890 1530 2001 0 98053 47.65611835
0 3 7 1715 0 1995 0 98003 47.30972002
0 3 7 1060 0 1963 0 98198 47.40949984
0 3 7 1050 730 1960 0 98146 47.51229381
0 3 7 1890 0 2003 0 98038 47.36840673
long sqft_living15 sqft_lot15
-122.25677536 1340.0 5650.0
-122.3188624 1690.0 7639.0
-122.23319601 2720.0 8062.0
-122.39318505 1360.0 5000.0
-122.04490059 1800.0 7503.0
-122.00528655 4760.0 101930.0
-122.32704857 2238.0 6819.0
-122.31457273 1650.0 9711.0
-122.33659507 1780.0 8113.0
-122.0308176 2390.0 7570.0
[21613 rows x 21 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Explorar los datos de las casas

Una correlacion con el sqft de una casa con su precio


In [8]:
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot", x="sqft_living", y="price")


Crear un simple modelo de regresión basado en sqft_living

Crear nuestro training data y nuestro test data


In [9]:
train_data,test_data = sales.random_split(.8,seed=0)

Construir nuestro modelo de regresion


In [10]:
sqft_model = graphlab.linear_regression.create(train_data, target='price', features=['sqft_living'],validation_set=None)


Linear regression:
--------------------------------------------------------
Number of examples          : 17384
Number of features          : 1
Number of unpacked features : 1
Number of coefficients    : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
| 1         | 2        | 1.014002     | 4349521.926170     | 262943.613754 |
+-----------+----------+--------------+--------------------+---------------+
SUCCESS: Optimal solution found.

Evaluar el modelo

El promedio de las ventas


In [11]:
print sqft_model.evaluate(test_data)


{'max_error': 4143550.8825285938, 'rmse': 255191.02870527358}

RMSE of about \$255,170!

Importar Matplotlib que es una libreria gráfica.


In [12]:
import matplotlib.pyplot as plt
%matplotlib inline

In [13]:
plt.plot(test_data['sqft_living'],test_data['price'],'.',
        test_data['sqft_living'],sqft_model.predict(test_data),'-')


Out[13]:
[<matplotlib.lines.Line2D at 0x1ef82ac8>,
 <matplotlib.lines.Line2D at 0x1ef82b70>]

Los puntos azules son nuestros datos y la linea verde es nuestra regresion lineal que usaremos para predecir los precios de las casas.

Abajo estan los coeficientes de la función.


In [14]:
sqft_model.get('coefficients')


Out[14]:
name index value stderr
(intercept) None -47114.0206702 4923.34437753
sqft_living None 281.957850166 2.16405465323
[2 rows x 4 columns]

Aplicar el modelo en 3 casas


In [15]:
house1 = sales[sales['id']=='5309101200']

In [16]:
sales


Out[16]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
7129300520 2014-10-13 00:00:00+00:00 221900 3 1 1180 5650 1 0
6414100192 2014-12-09 00:00:00+00:00 538000 3 2.25 2570 7242 2 0
5631500400 2015-02-25 00:00:00+00:00 180000 2 1 770 10000 1 0
2487200875 2014-12-09 00:00:00+00:00 604000 4 3 1960 5000 1 0
1954400510 2015-02-18 00:00:00+00:00 510000 3 2 1680 8080 1 0
7237550310 2014-05-12 00:00:00+00:00 1225000 4 4.5 5420 101930 1 0
1321400060 2014-06-27 00:00:00+00:00 257500 3 2.25 1715 6819 2 0
2008000270 2015-01-15 00:00:00+00:00 291850 3 1.5 1060 9711 1 0
2414600126 2015-04-15 00:00:00+00:00 229500 3 1 1780 7470 1 0
3793500160 2015-03-12 00:00:00+00:00 323000 3 2.5 1890 6560 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat
0 3 7 1180 0 1955 0 98178 47.51123398
0 3 7 2170 400 1951 1991 98125 47.72102274
0 3 6 770 0 1933 0 98028 47.73792661
0 5 7 1050 910 1965 0 98136 47.52082
0 3 8 1680 0 1987 0 98074 47.61681228
0 3 11 3890 1530 2001 0 98053 47.65611835
0 3 7 1715 0 1995 0 98003 47.30972002
0 3 7 1060 0 1963 0 98198 47.40949984
0 3 7 1050 730 1960 0 98146 47.51229381
0 3 7 1890 0 2003 0 98038 47.36840673
long sqft_living15 sqft_lot15
-122.25677536 1340.0 5650.0
-122.3188624 1690.0 7639.0
-122.23319601 2720.0 8062.0
-122.39318505 1360.0 5000.0
-122.04490059 1800.0 7503.0
-122.00528655 4760.0 101930.0
-122.32704857 2238.0 6819.0
-122.31457273 1650.0 9711.0
-122.33659507 1780.0 8113.0
-122.0308176 2390.0 7570.0
[21613 rows x 21 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [26]:
house3 = sales[sales['id'] == '6414100192']
house3


Out[26]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
6414100192 2014-12-09 00:00:00+00:00 538000 3 2.25 2570 7242 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long
0 3 7 2170 400 1951 1991 98125 47.72102274 -122.3188624
sqft_living15 sqft_lot15
1690.0 7639.0
[? rows x 21 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

In [24]:
print house3['price']


[538000L, ... ]

In [25]:
print sqft_model.predict(house3)


[677517.6542563724]

In [22]:
house2 = sales[sales['id']=='3793500160']

In [23]:
house2['price']


Out[23]:
dtype: int
Rows: ?
[323000L, ... ]

In [24]:
house2


Out[24]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
3793500160 2015-03-12 00:00:00+00:00 323000 3 2.5 1890 6560 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long
0 3 7 1890 0 2003 0 98038 47.36840673 -122.0308176
sqft_living15 sqft_lot15
2390.0 7570.0
[? rows x 21 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

In [28]:
print sqft_model.predict(house2)


[485786.3161435005]

In [29]:
house3 = sales[sales['id'] == '1321400060']

In [30]:
print house3['price']


[257500L, ... ]

In [31]:
print sqft_model.predict(house3)


[436443.6923644525]

In [ ]:


In [ ]:


In [ ]: